natural language query
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
This paper explores the task of interactive image retrieval using natural language queries, where a user progressively provides input queries to refine a set of retrieval results. Moreover, our work explores this problem in the context of complex image scenes containing multiple objects. We propose Drill-down, an effective framework for encoding multiple queries with an efficient compact state representation that significantly extends current methods for single-round image retrieval. We show that using multiple rounds of natural language queries as input can be surprisingly effective to find arbitrarily specific images of complex scenes. Furthermore, we find that existing image datasets with textual captions can provide a surprisingly effective form of weak supervision for this task. We compare our method with existing sequential encoding and embedding networks, demonstrating superior performance on two proposed benchmarks: automatic image retrieval on a simulated scenario that uses region captions as queries, and interactive image retrieval using real queries from human evaluators.
Detecting Moments and Highlights in Videos via Natural Language Queries
Detecting customized moments and highlights from videos given natural language (NL) user queries is an important but under-studied topic. One of the challenges in pursuing this direction is the lack of annotated data. To address this issue, we present the Query-based Video Highlights (QVHighlights) dataset. It consists of over 10,000 YouTube videos, covering a wide range of topics, from everyday activities and travel in lifestyle vlog videos to social and political activities in news videos. Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w.r.t. the query, and (3) five-point scale saliency scores for all query-relevant clips. This comprehensive annotation enables us to develop and evaluate systems that detect relevant moments as well as salient highlights for diverse, flexible user queries. We also present a strong baseline for this task, Moment-DETR, a transformer encoder-decoder model that views moment retrieval as a direct set prediction problem, taking extracted video and query representations as inputs and predicting moment coordinates and saliency scores end-to-end. While our model does not utilize any human prior, we show that it performs competitively when compared to well-engineered architectures. With weakly supervised pretraining using ASR captions, Moment-DETR substantially outperforms previous methods.
- Information Technology > Databases (0.82)
- Information Technology > Artificial Intelligence > Natural Language (0.62)
GDC Cohort Copilot: An AI Copilot for Curating Cohorts from the Genomic Data Commons
Song, Steven, Subramanyam, Anirudh, Zhang, Zhenyu, Venkat, Aarti, Grossman, Robert L.
The Genomic Data Commons (GDC) provides access to high quality, harmonized cancer genomics data through a unified curation and analysis platform centered around patient cohorts. While GDC users can interactively create complex cohorts through the graphical Cohort Builder, users (especially new ones) may struggle to find specific cohort descriptors across hundreds of possible fields and properties. However, users may be better able to describe their desired cohort in free-text natural language. We introduce GDC Cohort Copilot, an open-source copilot tool for curating cohorts from the GDC. GDC Cohort Copilot automatically generates the GDC cohort filter corresponding to a user-input natural language description of their desired cohort, before exporting the cohort back to the GDC for further analysis. An interactive user interface allows users to further refine the generated cohort. We develop and evaluate multiple large language models (LLMs) for GDC Cohort Copilot and demonstrate that our locally-served, open-source GDC Cohort LLM achieves better results than GPT-4o prompting in generating GDC cohorts. We implement and share GDC Cohort Copilot as a containerized Gradio app on HuggingFace Spaces, available at https://huggingface.co/spaces/uc-ctds/GDC-Cohort-Copilot. GDC Cohort LLM weights are available at https://huggingface.co/uc-ctds. All source code is available at https://github.com/uc-cdis/gdc-cohort-copilot.
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources
Ji, Haichao, Wang, Zibo, Pan, Cheng, Han, Meng, Zhu, Yifei, Wang, Dan, Han, Zhu
Abstract--Large Language Models (LLMs) have shown great promise in automating data analytics tasks by interpreting natural language queries and generating multi-operation execution plans. However, existing LLM-agent-based analytics frameworks operate under the assumption of centralized data access, offering little to no privacy protection. In contrast, federated analytics (F A) enables privacy-preserving computation across distributed data sources, but lacks support for natural language input and requires structured, machine-readable queries. In this work, we present LAF A, the first system that integrates LLM-agent-based data analytics with F A. LAF A introduces a hierarchical multi-agent architecture that accepts natural language queries and transforms them into optimized, executable F A workflows. T o improve execution efficiency, an optimizer agent rewrites and merges multiple DAGs, eliminating redundant operations and minimizing computational and communicational overhead. Our experiments demonstrate that LAF A consistently outperforms baseline prompting strategies by achieving higher execution plan success rates and reducing resource-intensive F A operations by a substantial margin. This work establishes a practical foundation for privacy-preserving, LLM-driven analytics that supports natural language input in the F A setting. The rapid development of Large Language Models (LLMs) has offered unprecedented capabilities in natural language understanding, reasoning, and planning [1], significantly transforming the landscape of data analytics. LLMs can interpret complex analytical intents, generate structured code, and orchestrate multi-step tasks by interacting with external environments such as databases and computation sandboxes. These capabilities have led to the emergence of LLM-based agents that decompose high-level queries, plan analytical workflows, and execute or verify results through tool interactions.
- North America > United States > California (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Texas > Harris County > Houston (0.14)
- (14 more...)
- Workflow (0.97)
- Research Report (0.82)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
GQVis: A Dataset of Genomics Data Questions and Visualizations for Generative AI
Walters, Skylar Sargent, Valderrama, Arthea, Smits, Thomas C., Kouřil, David, Nguyen, Huyen N., L'Yi, Sehi, Lange, Devin, Gehlenborg, Nils
Data visualization is a fundamental tool in genomics research, enabling the exploration, interpretation, and communication of complex genomic features. While machine learning models show promise for transforming data into insightful visualizations, current models lack the training foundation for domain-specific tasks. In an effort to provide a foundational resource for genomics-focused model training, we present a framework for generating a dataset that pairs abstract, low-level questions about genomics data with corresponding visualizations. Building on prior work with statistical plots, our approach adapts to the complexity of genomics data and the specialized representations used to depict them. We further incorporate multiple linked queries and visualizations, along with justifications for design choices, figure captions, and image alt-texts for each item in the dataset. We use genomics data retrieved from three distinct genomics data repositories (4DN, ENCODE, Chromoscope) to produce GQVis: a dataset consisting of 1.14 million single-query data points, 628k query pairs, and 589k query chains. The GQVis dataset and generation code are available at https://huggingface.co/datasets/HIDIVE/GQVis and https://github.com/hms-dbmi/GQVis-Generation.
- North America > United States > Massachusetts > Middlesex County > Lowell (0.04)
- Asia > Middle East > Jordan (0.04)
Agentic NL2SQL to Reduce Computational Costs
Jehle, Dominik, Purucker, Lennart, Hutter, Frank
Translating natural language queries into SQL queries (NL2SQL or Text-to-SQL) has recently been empowered by large language models (LLMs). Using LLMs to perform NL2SQL methods on a large collection of SQL databases necessitates processing large quantities of meta-information about the databases, which in turn results in lengthy prompts with many tokens and high processing costs. To address this challenge, we introduce Datalake Agent, an agentic system designed to enable an LLM to solve NL2SQL tasks more efficiently. Instead of utilizing direct solvers for NL2SQL that call the LLM once with all meta-information in the prompt, the Datalake Agent employs an interactive loop to reduce the utilized meta-information. Within the loop, the LLM is used in a reasoning framework that selectively requests only the necessary information to solve a table question answering task. We evaluate the Datalake Agent on a collection of 23 databases with 100 table question answering tasks. The Datalake Agent reduces the tokens used by the LLM by up to 87\% and thus allows for substantial cost reductions while maintaining competitive performance.
- Europe > Germany > Baden-Württemberg > Freiburg (0.06)
- North America > United States (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Agentic generative AI for media content discovery at the national football league
Wang, Henry, Salekin, Md Sirajus, Lee, Jake, Claytor, Ross, Zhang, Shinan, Chi, Michael
Generative AI has unlocked new possibilities in content discovery and management. Through collaboration with the National Football League (NFL), we demonstrate how a generative-AI based workflow allows media researchers and analysts to query relevant historical plays using natural language, rather than using traditional filter and click-based interfaces. The agentic workflow takes a user query in natural language as an input, dissects the query into different elements, and then translates these elements into the underlying database query language. The accuracy and latency of retrieval are further improved through carefully designed semantic caching. The solution performs with over 95-percent accuracy and reduces the average time of finding relevant videos from 10 minutes to 30 seconds, significantly increasing the NFL's operational efficiency and allowing users to focus more on producing creative content and engaging storylines.
- Workflow (0.57)
- Press Release (0.42)
- Research Report (0.40)
CoDA: Agentic Systems for Collaborative Data Visualization
Chen, Zichen, Chen, Jiefeng, Arik, Sercan Ö., Sra, Misha, Pfister, Tomas, Yoon, Jinsung
Deep research has revolutionized data analysis, yet data scientists still devote substantial time to manually crafting visualizations, highlighting the need for robust automation from natural language queries. However, current systems struggle with complex datasets containing multiple files and iterative refinement. Existing approaches, including simple single- or multi-agent systems, often oversimplify the task, focusing on initial query parsing while failing to robustly manage data complexity, code errors, or final visualization quality. In this paper, we reframe this challenge as a collaborative multi-agent problem. We introduce CoDA, a multi-agent system that employs specialized LLM agents for metadata analysis, task planning, code generation, and self-reflection. We formalize this pipeline, demonstrating how metadata-focused analysis bypasses token limits and quality-driven refinement ensures robustness. Extensive evaluations show CoDA achieves substantial gains in the overall score, outperforming competitive baselines by up to 41.5%. This work demonstrates that the future of visualization automation lies not in isolated code generation but in integrated, collaborative agentic workflows.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- Asia > Middle East > Jordan (0.04)
- Workflow (0.68)
- Research Report (0.64)
- Overview (0.46)
Beamforming-LLM: What, Where and When Did I Miss?
We present Beamforming-LLM, a system that enables users to semantically recall conversations they may have missed in multi-speaker environments. The system combines spatial audio capture using a microphone array with retrieval-augmented generation (RAG) to support natural language queries such as, "What did I miss when I was following the conversation on dogs?" Directional audio streams are separated using beamforming, transcribed with Whisper, and embedded into a vector database using sentence encoders. Upon receiving a user query, semantically relevant segments are retrieved, temporally aligned with non-attended segments, and summarized using a lightweight large language model (GPT-4o-mini). The result is a user-friendly interface that provides contrastive summaries, spatial context, and timestamped audio playback. This work lays the foundation for intelligent auditory memory systems and has broad applications in assistive technology, meeting summarization, and context-aware personal spatial computing.
- North America > United States > New York > New York County > New York City (0.05)
- Asia > Middle East > Israel (0.04)